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ABSTRACT 

Our  model  of  computation  is  a  parallel  computer  with  k  synchronized 
processors  Pp...,P^  sharing  a  common  random  access  storage,  where 
simultaneous  access  to  the  same  storage  location  by  two  or  more  processors 
is  not  allowed.   Suppose  a  2-3  tree  T  with  n  leaves  is  implemented  in  the 

storage,  suppose  a-^ a^  are  data  that  may  or  may  not  be  stored   in  the 

leaves,  and  for  all  i,  1  <  i  <  k,  processor  P.  knows  a..  We  show  how  to 
search  for  ap...,a^  in  the  tree  T,  insert  these  data  into  the  tree, 
delete  them  from  the  tree,  split  the  tree  with  respect  to  these  elements 
and  perform  the  union  of  k+1  range-disjoint  2-3  trees  in  O(log  n  +  log  k) 
steps. 

1.  Introduction 

Technology  will  make  it  possible  to  build  computers  with  a  large 
number  of  cooperating  processors  in  the  near  future.  However,  building 
such  computers  will  only  be  worthwhile  if  the  increased  computing  power 
can  be  used  to  reduce  considerably  the  execution  time  of  sufficiently  many 
basic  computational  problems.  In  particular,  one  would  like  to  have 
datastructures ,  where  k  processors  can  solve  many  problems  about  k  times 
faster  than  a  single  processor.   2-3  trees  are  one  such  datastructure  as 
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will  be  demonstrated  here.  Protocols  that  avoid  read  or  write  conflicts, 
if  several  processors  are  working  simultaneously  on  the  same  balanced 
tree,  have  been  studied  previously  [BS],  [S],  but  apparently  no  attempt 
was  made  to  design  fast  algorithms  and  to  analyze  their  running  time. 

In  the  sequel,  we  say  very  little  about  how  to  avoid  read  or  write 
conflicts.  In  the  situations  where  they  are  possible,  there  are  easy  ways 
to  avoid  them.  We  will,  however,  have  to  say  some  words  about  storage 
allocation. 

2.  2^  Trees 

A  2-3  tree  T  is  a  tree  in  which  all  leaves  have  the  same  depth  and 
each  interior  mode  v  has  two  or  three  sons:  the  left  son  £(v),  the  right 
son  r(v),  and  in  case  there  are  three  sons,  the  middle  son  m(v).  Data 
from  a  totally  ordered  domain  are  stored  in  the  leaves  with  smaller  data 
to  the  left  of  larger  ones.  For  each  node  v,  the  value  L(v)  [resp.  R(v)) 
of  the  largest  element  stored  in  the  subtree  of  T  with  root  £(v)  (resp. 
r(v)]  is  stored  in  v.  Recall  that  in  the  sequential  use  of  2-3  trees  R(v) 
is  not  stored.  If  v  has  three  sons,  then  the  value  M(v)  of  the  largest 
element  stored  in  the  subtree  of  T  with  root  m(v)  is  also  stored  in  v.  The 
depth  of  a  node  v  in  T  is  its  distance  from  the  root,  the  height  of  v  is 
its  distance  from  the  leaves.  We  assume  the  reader  to  be  familiar  with 
the  usual  search,  insertion,  deletion  and  split  routines  as  described  say 
in  [AHU]. 

Suppose  a  2-3  tree  T  with  n  leaves  is  implemented  in  the  storage, 
suppose  ap...,a|^  are  data  that  may  or  may  not  be  stored  in  the  leaves, 
suppose  ay  <>  ...  <  a^^  and  for  all  i  processor  P^  knows  a^.  We  show  how  to 
perform  any  of  these  four  operations  with  respect  to  the  k  elements  by  the 
k  processors  in  O(log  n  +  log  k)  steps.  Say  that  TQ,Tp...,T^  are  k+1  2-3 
trees  such  that  the  elements  which  are  stored  in  their  leaves  are  taken 
from  pairwise  disjoint  intervals.  We  show  how  to  perform  the  union  of 
these  trees  into  a  2-3  tree  in  O(log  n  +  log  k)  steps. 

If  the  elements  (or  the  trees  for  the  union  operation)  ap...,a^ 
arrive  unsorted  they  can  be  sorted  in  O(log  k)  time  (see  [AKS]).  Their 
solution  can  readily  be  modified  into  our  model  of  computation  using  k 
processors  (see  [V2]  for  similar  arguments). 

3.  Search 
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If  simultaneous  access  by  several  processors  to  the  same  storage 
location  for  read  purposes  is  allowed  (as  in  the  PRAM  model  of  parallel 
computation  of  [FW])  then  search  is  very  simple.  Processor  P.  (1  <  i<  k) 
performs  the  standard  sequential  search  of  a.  in  O(log  n)  time  ([AHU]). 
Since  no  writes  into  the  shared  memory  are  required,  this  is  done  in 
parallel  by  all  k  processors  in  time  O(log  n).  Since  such  simultaneous 
reads  are  not  allowed  in  this  presentation,  we  need  another  solution. 

A  chain  is  a  subsequence  a^ ,aj^p . . . ,a„  of  the  input  sequence 
ap...,a^.  Such  a  chain  corresponds  in  a  natural  way  to  a  chain  of 
processors  ^f »Pf+i »•••»?£ •  The  search  algorithm  starts  with  the  chain 
a2,...,a^  at  the  root  of  the  2-3  tree  T.  This  chain  is  subsequently  split 
into  many  subchains  which  are  wandering  down  the  tree.  Among  the 
processors  of  a  chain  a^, . . . ,a^  only  the  first  one,  i.e.,  P^,  is  active. 
Pj  knows  Z  and  of  course  f.  If  at  some  time  the  chain  is  split  into 
a^ ^m-l  ^^^  ^m>'*''^£»  then  processor  P^  will  invoke  processor  P   and 

transmit  the  value  Z    to  P„. 

m 

The  search  algorithm  proceeds  in  stages.  During  each  stage  s,  the 
active  processor  of  each  chain  C  will  access  the  data  in  some  node  v  of 
the  2-3  tree  T.  We  say  that  C  is  in  node  v  at  stage  s.  The  chain  ai,...,ai^ 
is  in  the  root  at  stage  1.  During  each  stage,  each  active  processor 
processes  its  chain  once.  We  describe  how  this  is  done. 

Suppose  a  chain  C  =  a^, . . . ,a^  is  in  node  v  at  stage  s,  the  node  v  has 
two  or  three  sons  and  the  labels  L(v)  and  possibly  M(v)  are  stored  in  v. 
We  say  that  C  hits  a  label  X,  if  a^  <  X  <  a^j^ .  The  label  R(v)  does  not 
play  any  role  in  the  present  discussion. 

Chains  C  that  hit  no  label  are  sent  to  the  appropriate  son  of  v;  more 
precisely:   C  is  at  stage  s+1  in  node 

Z(v)  if  a^   <  L(v) 

m(v)    if  L(v)  <  aj  and  a^   <  M(v)  and  v  has  3  sons 

r(v)    if  M(v)  <  a^  and  v  has  3  sons 
or  L(v)  <  aj  and  v  has  2  sons. 

For  chains  C  =  ajr,...,ap  let  C^  =  aj,...,a  i  and  C9  =  a  ,  ...,ap  with 
m  =  (f-Hl)/2"  .  If  C  hits  a  label,  then  it  is  split  into  C^  and  C2.  If 
C. ,   i  =  1,2,   hits  no  label,  then  it  is  sent  to  the  appropriate  son,  else 


it  remains  in  v,  i.e.,  C.  is  in  node  v  at  stage  s+1.  Clearly,  a  chain  can 
be  processed  in  0(1)  steps. 

Claim.  (a)  (  resp.  (b))  .  Say  that  elements  a^,a^_^_-^, . . .  ,a^  only  have 
passed  through  edge  e  of  T  through  stage  s,  for  any  s  >  1.  If  a  chain  C 
such  that  a.  C  and  j  >  £  (resp.  j  <  f)  passed  through  e  at  stage  s+1 
then  s-ji^+i  C  (resp.   a^_^    C). 

Corollary.  No  more  than  two  chains  may  pass  each  edge  e  of  T  at  any 
single  stage. 

Proof  of  Claim.  By  induction  on  the  depth  of  e  in  T.  The  claim 
obviously  holds  for  each  edge  e  that  emanates  from  the  root  of  T.  This 
completes  the  base  of  the  induction.  Assume  that  both  parts  (a)  and  (b) 
of  the  claim  hold  for  all  edges  of  depth  k.  Let  e^  =  (v,w)  be  an  edge  of 
depth  k+1  and  e2  =  (u,v)  its  father  edge  in  T.  We  will  show  that  part  (a) 
of  the  claim  holds  for  e.  The  proof  of  part  (b)  is  similar. 

Elements  a^  ,a^^i  , . . .  ,aj,  (resp.  a^)  passed  62  through  stage  s-1 
(resp.  s).  By  the  inductive  hypothesis  a^^+i  passed  62  no  later  than 
stage  s. 

Case  K   w  is  a  left  son  of  v. 

If  a.^j^  passed  e2  at  stage  s  then,  by  the  inductive  hypothesis,  a^^^^.^ 
passed  e2  at  the  same  chain  as  a-.  If  a^.,.^  passed  e2  before  stage  s  then 
ao  . 1  and  a-  were,  again,  in  the  same  chain  at  e2  since  otherwise  ^i+i 
would  not  have  been  delayed  at  v.  Since  left  chops  of  hit  chains  are  sent 
to  left  sons  and  a.^^-^  did  not  pass  ej^  before  stage  s+1,  ajj^+i  and  a.  pass 
ej^  in  the  same  chain  at  stage  s+1. 

Case  2_.   w  is  a  right  son  of  v. 

The  chain  in  which  a^  passed  62  did  not  contain  a^^^.^  because  if  it 
did  then  a^^^^  would  have  passed  e^^  not  later  than  a^.  So,  since  a.  (and 
an+i)  could  not  have  delayed  at  v  it  passed  e2  at  stage  s  and  by  the 
inductive  hypothesis  its  chain  included  a.^_^^.  This  chain  passed  e-j^  at 
stage  s+1. 

Case  3.   w  is  a  middle  son  of  v. 

If  the  chain  in  which  a^  passed  62  contained  a^^-^^  then  it  must  have 
contained  a-  and  large  enough  elements  to  hit  label  M(v);  now,  if  it  did 
not  hit  L(v)  then  the  left-choping  arguments  (see  Case  1)  imply  that  a^^^^ 
and  a.  passed  e^  in  the  same  chain.  If  it  hit  L(v)  then  this  chain  (or 
later  subchains  of  it)  are  cut  into  pieces  that  separate  a^  and  a^;  a^^^^j^ 
must  be  in  the  right  one  with  a.  (otherwise  it  is  sent  on  e^^  no  later  than 
an  )   and,   again,  the  left-choping  arguments  apply.   If  the  chain  in  which 
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a„  passed  e^  did  not  contain  a^^i  then  the  analysis   of   Case   1   applies. 

This  completes  the  proof  of  the  claim. 

The  corollary  implies  that  for  each  s  and  v  at  most  4  chains  are  in  v 

at  stage  s.  Thus,  each  stage  lasts  0(1)  steps.   Once  a  chain  a^,...,a£  has 

arrived  in  a  leaf  b,  the  processors  ^f+\f"y^Q    have  to  be  informed  of  the 

value  of  b.  This  is   done   recursively  in  log  k  stages.   In  stage   j, 

0  <  j  <   [log  1?  -  1,  processor  P.  that  knows  where  a,-  falls  informs  P    ., 

1  1  i+23 

if  this  later  processor  does  not  know  yet  where  a    .  falls.   See  [VI]  for 

i+2J 
more   details.    \«nienever  a  chain  hits  a  label  it  is  halved.   Thus,  any 

element  may  be   contained  in   chains   that   hit   labels   no   more 

than  log  k  times;    therefore   it   arrives   to  a  leaf   in  at  most 

log  n  +  log  k~    stages,  and  the  search  takes  O(log  n  +  log  k)  time. 

4.  Insertions 

The  tree  T  has  n  leaves  bj^  <  b2  <  ...  <  b  .  The  elements  ai,...,ai^ 
are  to  be  inserted  into  T. 

We  first  run  the  search  algorithm.  This  results  in  splitting  the 
input  into  chains  (a-  and  a-,  l<i<j<k,  belong  to  the  same  chain  if 
there  is  no  leaf  b  ,  1  <  q  <  n,  such  that  a^  <  b  <  a-).  There  are  n+1 
possible  chains  C^,Cj^ , . . .  ,C^.  Let  |C  |  denote  the  number  of  input 
elements  in  chain  C  .  (For  most  q,  0  <  q  <  n,  | C  |  =  0  since  it  makes 
sense  to  insert  elements  to  a  tree  rather  than  building  it  from  scratch 
only  if  k  <<  n. )  We  say  that  by  the  insertion  algorithm  elements  of  chain 
Cq  (resp.  Cq)  arrive  to  leaf  b  (resp.  b^)  and  fall  to  its  right  (resp. 
left)  hand  side,  for  1  <  q  <  n. 

First,  we  describe  a  simple  algorithm  for  the  special  case  | C^ |  =  0 
and  |C  I  <  1  for  all  1  <  q  <  n.  This  algorithm  works  in  stages.  In  stage 
1,  for  all  i  processor  P^  makes  a^  a  son  of  a  father  of  b^  and  then  stands 
by  on  a^.  Now  the  algorithm  works  such  that  for  all  s  after  stage  s  the 
following  holds: 

All  leaves  in  the  tree  have  the  same  depth,  all  interior  nodes  of 
height  ^  s  have  two  or  three  sons.  Between  each  pair  of  "old"  nodes 
(resp.  to  the  right  of  the  rightmost  old  node)  of  height  s-1,  there  is  at 
most  one  "new"  node  of  height  s-1.  Each  such  new  node  has  a  processor 
standing  by.  So  an  (old)  node  v  of  height  s  has  at  most  three  new  sons. 
It  also  has  no  more  than  three  old  sons.  In  an  obvious  way  the  processor 
standing  by  at  a  leftmost  new  son  of  a  node  of  size  s  "takes  over,"  while 


the  other  two  processors  become  inactive.  In  case  the  total  number  of  old 
and  new  sons  of  v  is  <  3  the  new  son  becomes  an  "ordinary"  son  of  v  and 
the  processor  becomes  inactive.  In  case  this  total  number  of  sons  is  >  3, 
a  new  internal  node  v'  of  height  s  which  becomes  the  right  brother  of  v  is 
created  and  the  new  and  old  sons  of  height  s-1  are  partitioned  properly 
among  v  and  v'.  The  processor  then  stands  by  on  v'.  Updating  the  L,  M 
and  R  fields  of  v  and  v'  in  both  cases  is  easy.  So  stage  s  takes  0(1) 
steps.  We  showed  that  in  each  stage,  several  new  nodes  of  the  tree  may  be 
created  simultaneously.  We  will  say  later  how  to  do  this  without 
occupying  too  much  storage  space. 

Let  us  go  back  to  the  general  problem  of  insertion.  If  |C  |  >  0, 
start  by  inserting  ai  by  the  sequential  algorithm.  The  new  C^  (with 
respect  to  the  new  tree  and  32,3^, . . . ,a,  )  satisfies  |C  |  =  0. 

The  problem  of  inserting  a  long  chain  C.  =  af,...,aj,  at  leaf  b-,  for 
1  <  j  <  n,  is  reduced  to  the  problem  of  inserting  shorter  chains.  This  is 
done  by  first  inserting  the  middle  element  a  (m  =  (f+!L)/2  )  at  leaf  b- 
and  then  inserting  recursively  ar,...,a  ■^  at  b.  and  aj^p...,ap  at  a^. 
This  is  done  for  all  chains  in  parallel  and  the  middle  elements  are 
inserted  by  the  simple  algorithm  described  above.  After  the  chains  have 
been  split  log  k  times,  they  are  reduced  to  length  one.  Thus,  running 
first  the  algorithm  for  C^  and  then  the  simple  algorithm  'log  k^,  times 
would  do  the  job  in  O(log  n  log  k)  steps.  For  i  <  .log  k"  ,  let  T^  be  the 
tree  obtained  by  running  the  algorithm  for  C  and  then  the  simple 
algorithm  i  times.  Now  for  all  i  running  the  simple  algorithm  the  i'th 
time  results  in  a  wave  of  processors  running  up  T^_j^  at  a  speed  of  one 
level  per  stage,  and  below  this  wave,  the  tree  already  looks  like  T^. 
Thus,  pipelining  can  be  applied;  this  is  since  before  starting  the 
(i+l)-st  run  of  the  simple  algorithm  and  with  it  the  (i+l)-st  wave  of 
processors,  one  has  not  to  wait  until  the  i'th  wave  has  reached  the  root, 
but  only  long  enough  to  ensure  that  the  two  waves  will  not  overlap.  Three 
stages  will  certainly  suffice. 

5.  Deletions 

For  deleting  the  elements  a^  <  32  <  . .  •<  3j^  from  the  2-3  tree  T,  we 
first  run  the  se3rch  algorithm.  Similar  to  the  description  of  the 
insertion  algorithm  we  have  elements  of  chain  C  (resp.  C^)  which  arrive 
to   leaf  b   (resp.   bj)  and  fall  to  its  right  (resp.   left)  hand  side,  for 
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1  <  q  <  n.  C  cannot  contain  an  element  which  is  in  a  leaf  and  therefore 
can  be  ignored.  For  each  (non-empty)  chain  C  (1  <  q  <  n)  the  processor 
of  its  left-most  element  checks  if  its  element  is  the  same  as  the  one 
stored  in  leaf  b  .  If  it  is  so,  the  processor  marks  leaf  b  for  deletion 
and  stands  by  on  this  leaf.   All  other  processors  become  inactive. 

First,  we  describe  a  simple  algorithm  for  the  special  case  where  each 
node  of  height  1  has  at  least  one  non-marked  son.  This  algorithm  works  in 
stages. 

In  stage  1  all  the  marked  leaves  are  deleted.  Following  stage  1  a 
node  of  height  1  may  either  remain  a  node  of  height  1  or  become  a  node  of 
height  0  (a  leaf).  Say  that  a  node  v  of  height  1  had  one  or  two 
son-leaves  which  were  deleted.  If  it  remained  of  height  1  (it  is  possible 
if  it  had  three  sons  and  only  one  was  deleted)  then  the  processor  of  this 
deleted  leaf  becomes  inactive.  Otherwise  the  processor  of  a  deleted 
son-leaf  stands  by  on  v  and  v  is  marked.  The  processor  of  a  (possibly 
existing)  second  deleted  son-leaf  becomes  inactive. 

The  algorithm  works  such  that,  for  all  s  (s  >  1),  the  following  holds 
after  stage  s: 

1.  Each  marked  node  is  a  root  of  a  2-3  tree  of  height  s-1  and  is 

a  son  of  a  node  of  height  s+1  in  T.   It  has  a  processor  standing 
by  on  it. 

2.  Each  internal  node  has  two  or  three  sons. 

3.  Each  node  of  height  s  in  T  which  is  not  marked  is  a  root  of 
a  2-3  tree  of  height  s. 

It  is  easy  to  verify  that  each  node  v  of  height  s+1  in  T  which  has  a 
marked  son  must  have  between  two  and  seven  sons  and  grandchildren  of 
height  s-1.  All,  but  one,  of  the  processors  which  stand  by  on  a  marked 
son  of  V  become  inactive.   This  processor  does  the  following  in  stage  s+1: 

-  if  V  has  >  4  sons  and  grandchildren  of  height  s-1  then  they  are 
partitioned  in  the  usual  way  into  sons  of  v  so  as  to  make  v  of  height  s+1 
as  before.   The  processor  becomes  inactive. 

-  else  the  nodes  of  height  s-1  become  sons  of  v,  in  the  usual  way,  v 
is  marked  and  the  processor  is  standing  by  on  v. 

Care  has  to  be  taken  in  order  to  avoid,  read  or  write  conflicts  and 
to  choose  at  each  stage  the  processors  which  become  inactive.  This  as 
well  as  updating  the  L,  M  and  R  fields  is  easy.  The  algorithm  runs  in 
O(log  n)  time. 

Let  us   go  back  to  the  general  case.   We  run  the  following  algorithm 


A.  It  works  in  stages.  Denote  our  2-3  tree  T  by  T^.  Let  T^.  be  the  2-3 
tree  which  is  the  output  of  stage  t  for  t  >  1. 

Stage  t_   (t  >  1) 

T^_-i  is  the  input  2-3  tree  for  stage  t.  For  each  node  of  height  1  in 
T  -i  such  that  all  its  son-leaves  have  to  be  deleted  marks  all  leaves,  but 
one,  for  deletion.  For  each  node  such  that  not  all  its  son-leaves  have  to 
be  deleted  mark  the  ones  that  have  to  be  deleted.  Processors  of  these 
marked  leaves  stand  by  on  them.  Processor  of  leaves  that  have  to  be 
deleted  but  have  not  yet  been  marked  do  not  take  part  in  the  rest  of  this 
stage.  The  stage  proceeds,  now,  in  the  same  way  as  the  algorithm  for  the 
special  case  given  above. 

In  each  stage  we  delete  at  least  half  of  the  leaves  that  have  to  be 
deleted  but  have  not  been  deleted  by  previous  stages.  Therefore, 
algorithm  A  runs  in  <  log  k  stages.  Similar  to  Section  A  (insertions) 
we  pipeline  the  stages  of  algorithm  A  thereby  obtaining  an  overall  time 
complexity  of  O(log  n  +  log  k). 

6.  Splits 

Given  k  elements  a,<a2<...<a^  we  would  like  to  split  the  tree  T  with 
respect  to  them.  Namely,  if  the  leaves  of  T  represent  a  set  of  elements  S 
then  the  output  of  the  split  opration  is  k+1  2-3  trees  TQ,Tp...,T|^  such 
that  the  leaves  of  tree  T.  represent  the  elements  of  S-;  where,  S.={aeS: 
a^<a<a^^^}  for  l<i<k-l,  SQ={aES:  a<a^}  and  S^={aeS:  a>a^}. 

In  order  to  clarify  the  presentation,  let  us  start  with  a  solution 
for  the  corresponding  sequential  split  problem,  i.e.,  we  give  an  algorithm 
which  employs  a  single  processor  for  the  splitting  of  a  2-3  tree  'with 
respect  to  a  single  element. 

1.  Using  the  standard  search  algorithm,  find  the  path 
•r(a)  =  (pj^, . . .  ,pj.)  from  the  root  of  T  to  a.  Delete  the  whole  path  TT(a)  and 
all  edges  adjacent  to  it  from  T. 

One  is  left  with  a  forest  of  subtrees  of  T  some  of  which  were  to  the 
left  of  the  path  it  (a)  and  the  others  were  to  the  right  of  it  (a).  Let  us 
call  these  subtrees  of  T  the  left  (resp.  right)  side  trees  of  path  it  (a). 
If  a  was  not  stored  in  the  tree,  then  a  was  eventually  compared  to  a  leaf 
b  of  T  with  the  result  a  <  b  or  a  >  b.  In  the  first  [second]  case  treat 
b,  and  possibly  its  right  [left]  brothers,  as  right  [left]  side  trees  of 
Tr(a). 


2.  Join  the  left  (resp.  right)  subtrees  of  it  (a )  into  a  2-3  tree  L 
(resp.   R). 

We  describe  this  step  for  the  left  subtrees  only.  Right  subtrees  are 
handled  similarly.  This  is  done  in  stages  which  are  described 
recursively.  Before  starting  stage  s  we  have  a  2-3  tree  S  that  contains 
the  leaves  of  all  left  subtrees  of  height  <  s-1.  The  height  of  S  is 
h  <  s.  There  are  either  none  or  one  or  two  left  subtrees  of  height  s.  In 
the  second  (resp.  third)  case  we  denote  them  L  (resp.  L,  and  Lo).  In 
the  first  case  S  is  the  "output"  of  stage  s.  In  the  third  case  L,  and  Lo 
are  joined  into  a  2-3  tree  of  height  s+1  in  the  obvious  way.  Let  us 
denote  this  new  tree  by  L,  too.  No  confusion  will  arise.  In  both  the 
second  and  third  cases  we  reach,  down  the  rightmost  path  in  L,  a  node  of 
height  h+1.  The  root  of  S  becomes  its  son  and  the  insertion  is  propagated 
in  the  standard  way  up  in  L.  (The  possibility  where  S  is  of  height  s  and 
we  are  in  the  second  case  is  simple  and  has  to  be  added  to  the 
description.)  This  completes  the  description  of  one  step.  Note  that  the 
height  of  the  new  S  is  <  s+1. 

Throughout  step  2  we  did  not  visit  any  level  more  0(1)  time;  in  stage 
s  we  never  go  below  the  height  h  of  S  before  the  stage  started.  Each  time 
we  visit  a  level  0(1)  operations  are  performed.  Therefore,  Step  2  takes 
O(log  n)  time. 

We  now  parallelize  this  algorithm  in  order  to  split  simultaneously  a 
2-3  tree  T  with  respect  to  elements  ai,...,au: 

1.  Run  the  search  algorithm  for  a-^,...,a-^^   and  mark  the  paths 
IT  (aj^) , .  . .  ,17  (a^).   As  we  do  not  require  the  elements  a. 

to  be  stored  in  the  tree  T,  these  paths  are  not  necessarily 
distinct. 
For  all  i,  we  define  the  left  [right]  forest  LF(i)  [RF(i)]  of  path 
Tr(a^)  as  the  set  of  left  [right]  side  trees  of  Tr(a^)  whose  root  is  not 
marked  and  that  are  not  left  [right]  side  trees  of  Tr(a._-^)  [TT(a.^,)].  The 
example  of  Fig.  1  shows  the  only  case  where  RF(i)  and  LF(i+l)  may  have  a 
tree  in  common.  "Correct"  our  definition  for  this  case,  so  that  this  tree 
belongs  to  LF(i+l)  only. 

2.  Rerun  the  search  algorithm,  but  for  all  chains  ar,...,a, 
that  are  created  have  the  processors  p^  and  pj  both  active. 
Processor  Pf[P£]  keeps  track  which  trees  are  in 

LF(f)  [RF(£)].   Also  delete  in  this  run  the  paths 

IT  (a| ) , . . .  ,iT  (a^)  and  the  adjacent  edges.   For  all  chains 
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ar...a5  that  have  reached  a  leaf  only  processors  P^  and  Pjj^ 
reroain  active.   P^  remembers  the  index  I    of  the  next  active 
processor.   The  following  two  commands  are  executed  by  active 
processors  P^  only,  similar  to  Step  2  of  the  sequential  algorithm. 

For  all  Pj  processors:   join  the  left  forest  of  path  irCa^)  into 


2-3  tree  L 


f-1 


4.  For  all  ?.    processors:   join  the  right  forest  of  path 
IT  (ajj  )  into  a  2-3  tree  Rj^ .   (All  R^  and  L^  that  were 
not  affected  by  the  last  two  commands  are  empty.) 

5.  (a)   Processor  P,:   Insert  L  into  T^. 

Processor  P^:   Insert  Rj^.  into  T^. 

(b)   For  all  P^  processors  (1  <  i  <  k):   Join  L^  and  R^ 
into  a  2-3  tree  T..   (The  "insertion"  into  the  T^ 
trees  should  be  understood  as  renaming  rather  than  copying.) 

The  split  algorithm  takes  O(log  n  +  log  k)  time. 

Note  that  throughot  this  section  we  omitted  the  updates  of  the  fields 
L(v),  M(v)  and  R(v).   It  is  always  easy  to  complete  these  details. 

7.  Unions 


Let  TQ,Tp...,T^  be  2-3  trees  such  that  the  elements  that  are 
represented  by  their  leaves  satisfy  the  following:  if  a  leaf  of  T^ 
(resp.  T.)  is  represented  by  element  a^  (resp.  a-)  and 
(Xi<j<k  then  a^<a:j.  Our  problem  is  to  join  TQ,Tp...,T^ 

into  a  new  2-3  tree.  As  was  done  before,  we  start  with  a  solution  of  the 
corresponding  sequential  union  problem:  Join  Tq  and  T^  into  a  2-3 
tree. 

A  fairly  awkward  way  of  performing  this   is  given.   However,  this  technique 
is  useful  in  the  parallel  algorithm  since  it  enables  pipelining.   Start  at  the 
rightmost  leaf  of  Tq  and  the  leftmost  leaf  of  T-^.   Climb,  level  by  level, 
simultaneously  in  Tq  and  T^  till  the  first  root  of  either  Tq  or  T^  is 
encountered.   Then  join  Tq  and  T^  to  one  tree  and  propagate  the  update,  in  the 
standard  way,  to  form  a  2-3  tree.   The  algorithm  requires  O(log  n)  time. 
See  [AHU]  for  all  "standard  ways"  mentioned  above. 
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We  now  parallelize  and  pipeline  this  algorithm  in  order  to  compute  the 
union  of  Tq,T, , • . . ,T-  .  Let  us  denote  them  by 

phases         j  =  0,1 


J°\t{°\...,Tj^°\  respectively.   This  is  done  in 


For  any  tree  let  A (T)  (resp.  p(T))  denote  the  leftmost  (resp.  rightmost) 
leaf  of  T.   For  all  odd  i,  processor  P^  determines:   ^(T^^i) 

pa[Z{),   X(t(°>)  and  p(t(°)). 

The  following  is  true  for  j  =  0  and  will  remain  true:   At  the  beginning  of 

phase  j  we  are  left  with   (k  +  1)/2J   trees  T^  J\t^  J  ■' , . . .  where 

for  each  v    {0,...,  (k  +  1)72^  -1}  the  tree  T^J^  is  a  2-3  tree  obtained 

by  joining  T   .,..., T      .   .   For  each  v  we  have  not  yet  used 
v2J      (v+l)2J-l 


processor  P   .  and  processor  P 


(j)- 


-(j). 


.  and  processor  P   .  knows  p  (T,;2i  ) 
2J  v2J 


X(T<J'. 

The  following  is  done  in  phase  j:   For  each  odd  v  processor  P   .  runs 
up  the  right  branch  of  T^J)  ^"^  the  left  branch  of  T^J)  and   ^^ 
joins  the  two  trees  into  '^(^-\)/2f    ^^    ^^  Step  3  of  the 
sequential  algorithm.   This  processor  performs  also, 

ACT^Jj)         if    T^J}   ^    <j, 


nT^^l}j/2) 


and 


;v-i3/2' 


Xdi^h        otherwise. 


(t(J))  if   T,9^  *    ^ 


v-1- 


Finally,  observe  that  the  phases  can  be  pipelined,  i.e.,  for  all  j, 
phase  j+1  can  be  started  a  constant  number  of  steps  after  phase  j.  The 
union  algorithm  takes  O(log  n  +  log  k)  time. 

Note  that  throughot  this  section  we  omitted  the  updates  of  the  fields 
L(v),  M(v)  and  R(v).   It  is  always  easy  to  complete  these  details. 

It  is  intereating  to  note  that  performing  first  the  split  algorithm 
and  then  the  union  algorithm  yields  an  alternative  deletion  algorithm. 
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8.  Storage  Allocation 

Nodes  of  a  2-3  tree  with  n  leaves  are  stored  in  the  first,  say  N, 
locations  of  some  vector  A.  During  the  insertion  algorithm,  each  processor 
p.  may  create  n.  <  log  n  new  nodes.  Therefore,  for  each  processor  P^ 
log  n  consecutive  locations  of  some  other  vector  B  are  reserved,  where  the 

n 
algorithm,  the  numbers  N.  are  computed  in  parallel  (in  0  (log  k)  time)  and 
for  all  i   processor  P.   copies   the  nodes   that   it  created  into  rows 
N+N^+1,... ,N+N^+n^  of  A. 

During  command  2  of  the  split  algorithm,  each  processor  P^  may  cancel 
m-  <  log  n  nodes,  i.e.,  locations  in  A.  Each  processor  P^  stores  the 
numbers  of  these  rows  in  its  private  section  in  the  vector  B.  After 
command  2  of  the  split  algorithm  M  =  Z^^^^  m^  is  computed.  Now  the  rows 
with  numbers  >  N-M  that  were  not  cancelled  have  to  be  copied  into  the  rows 
with  numbers  <  N-M,  that  have  been  cancelled:  locations  N-M+1,...,N  of  A 

are  partitined  into  blocks  Bj^ B^,  each  consisting  of   at  most   log  n 

consecutive   locations  of  A.  Each  processor  P^  determines  the  number  d^^  of 
locations  in  B.  that  were  not  cancelled.   The  numbers 
computed.   Similar  considerations  apply  for  the  deletion  algorithm. 

2S  <  > 

The  numbers 

R^  =  ^  i<'4  r-  are  computed.  Each  processor  P^  writes  the  indices  in  p^  in 
places  R-+1, . . . ,R^+r^  of  some  vector  C.  Once  all  processors  are  done  with 
this,  P.  copies  the  locations  of  block  B.  that  were  not  cancelled  in  those 
locations  of  A  whose  indices  are  in  places  D^+1 , . . . ,D^+d^  of  vector  C. 

Later  in  the  split  algorithm  ,  every  processor  may  create  0(ldg  n) 
new  nodes  Storage  allocation  is  handled  as  in  the  case  of  insertions. 
This  applies  for  the  deletion  and  union  algorithms,  as  well. 
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